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1 Derivation of Diffusion Equations 

In this technical section, we construct the Kobnogorov equations which determine the dynamics of the probability 
distribution function P{x,t). In order to do this, we first calculate the transition probabilities between the various 
states X e {0,^,^,...,!} 

Let T| (x) denote the probability that the system makes a transition from the state with a fraction x of mutators to 
the state with a fraction a; + of mutators. This may occur in one of the following two ways: 

1. A mutator is selected for birth, a wild-type is selected for death, and no mutation occurs. 

2. A mutator is selected for birth, a wild-type is selected for death, a beneficial mutation occurs, and this mutation 
is part of the fraction 1 — s that is destined for loss by random drift . 

Computing these probabihties in the order listed, we arrive at the following expression for T| {x) 

^ = x{l — x)(l — + x(l — a:)/i+ae(l — s) 

r 

= x(l-a-)[l-/i+(l-ae(l-s))] (1) 

The factor of r on the LHS is just the birth probability per time-step which, according to A1-A3 is common to all 
members of the population and will soon be scaled out. In a similar way we calculate T'i(.t), the probability that the 
system makes a transition from the state with a fraction x mutators to the state with a fraction x — mutators. In fact, 
we may simply interchange a; <-> 1 — x and ^ /i_ in EqUwhich results in 

^=x(l-x)[l-A*-(l-«e(l-s))] (2) 

r 

Within the framework of A1-A3, the population may also make large, non-local transitions to the "absorbing" .t = 
and X ~ \ states if the mutator or wild-type strains produce an advantageous mutant which is marked for fixation. This 



I 



Jives rise to 



— — = x^+aeS (3) 
r 

T 

= {l-x)n_aeS (4) 



The probability that the population undergoes no change during a timestep is simply what remains 

To 

y = 1 - (x) - {x) -Tflx- Tloss (5) 

These transition probabilities allow us to write down the so called forward and backward Kolmogorov diffusion 
equations which describe the time dependent probability density P{x, t) that the mutator frequency is x at time t. The 
forward equation reads; 

~ [T,{x)+T^{x)]P{x,t) 

+ + ^)P{x + ^, + T^{x - l)P(x - 1 i) 

- [Tf,,{x)+Tioss{x)]P{x,t) (6) 

Taking the continuum limit and plugging in the specific expressions for transition probabilities, we obtain for the 
forward equation 

dP 1 

m - N8x^^<''^^^^ 

+ [1 - ae(l - s)] (^+ - M-)^ - x)P] 

- NaeS [xfi+ + (1 - x)fi^] P (7) 

where t has been rescaled by N/ r so that the units are now "generations." This is Eq(4) in the main text. 

An approximation to a limited version of Eq|7]is solved in section |3] However, we can write an equivalent "back- 
ward Kolmogorov" equation which is often more mathematically convenient than Eq. [T] Defining G{xo, t) as the 
probabiUty that the mutator has been lost by time t, we find 

G{xo,t + M) ^T^G{xo- ^,t)+T^G{xo + ^,t)+ToG{xo,t)+Tioss{xo) (8) 
The backward equation is primarily useful in its steady state form. Defining G{xo, t oo) = Gooixo) and taking the 
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continuum limit, we obtain the ODE 



- l^G 

— N H+aeS- — hiV/X-QfeS — (9) 

1 Xo Xfi 



This is Eq(6) from the main text. 



2 Limiting Solutions to Eqj9l when = 

As in the main text, we define B = n-f- [1 — ae{l — s)] and C = ji+aeS. If NueS ^ 1 but is sufficiently 
small, NS^ is no longer much larger than 1, and the approximations in the main text are not valid. This occurs when 
/i+ ~ 0{1/N'^aes). In this case, the B term, and hence deleterious mutations, in Eq|9]is irrelevant, and Goo{xo) can 
be expressed in terms of a modified Bessel function: 



When N^/C is not large, this does not have the exponential dependence on Nxo required to interpret the fixation 
probability as resulting from a true effective selection coefficient. We can nevertheless calculate the fixation probability 
for small Xo'- 



Pfix[Xo) ~ NVCXo j=- = Nj^+aeSXo / — (11) 

For ^ \/{N'^aes), the argument of the Bessel function is large, and we recover our previous result: Pf ix ~ 
Nxo^lJ.+oieS. For small argument, we get Pfix ~ Xo(l + N'^C/2) = xq{1 + N'^iJ.^aes/2). Thus the fixation 
probability approaches the neutral result Xo as ^ and starts out rising linearly in If we wanted to translate 
this into an effective selection coefficient, since for small Ns, Pfix{xo) ~ 2:0(1 + Ns/2), the effective selection 
coefficient would be 5*^ = N ji+apS, whose explicit N dependence again points to the inability to define an effective 
selection coefficient in this regime. 

When N ^ 0{1) and N'^jij^a^.s ^ 0(1), all the terms in the equation are of the same order, and no approxima- 
tion can be made. However, for smaller one can use perturbation theory to find an approximate solution by writing 
Goo = 1 — x'o + rj{xo), where rj{xo) <^l — Xo- After dropping terms ~ NB?]' and ~ N^Crj, we obtain 

GooiXo)~l-Xo NXoil-Xo) (12) 
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with a fixation probability Pfix{xo) » Xo{l + N{CN - B)/2) = Xo[l + fi+N{ae{Ns + 1) - 1)], which linearly 
approaches the neutral value Xo as fl-^- 0. As above, in this very small /i+ regime, no mapping to an A^-independent 
effective selection coefficient can be made. Note that we again recover our threshold criterion for mutators to be favored 
(main text Eq (7)). 

3 Approximate Solution to Forward Equation when /i = 

Eq|7]can be approximately solved if we take /x_ = 0. The equation then reads 

^ = ["^^^ " "^^^1 + [^(^ " "^^^1 A^CxM+i' (13) 

The biological problem we are interested in solving is the fixation probability for a small initial fraction of mutators. 
This corresponds to solving for J^^^ P{x, t — > oo)dx as e ^ 0, subject to the initial condition P{x, 0) = 5{x — Xq), 
where Xo <C 1 and 6{x — Xq) is a Dirac delta function. Furthermore analytic progress can be made if we note that x 
is in some sense small. The idea is that the probability cloud P(.t, t) is initially localized around Xo ^ 1, and that the 
only process that moves probability solidly into the interior of a; G (0, 1) is random genetic drift. We anticipate this 
effect to be small when the mutator is significantly favored, i.e. iVS*^ ^ 1, and hence P{x, i) « for x not <C 1. 
Thus, we can approximately neglect the 0{x^) terms in EqlT3]and obtain 

'-^-^^A-P\^B§-[xP]-NCx,,P (14) 

This second order PDE in (x, t) can be converted to a first order PDE in (fc, t) by taking the spatial Fourier transform, 
which yields 

dP dP 
N- . ^.ik^-^Bk + C)- (15) 

P{k,t = 0) = cxp {—ikxo) 



This equation can be solved by the "method of characteristics", in which we seek curves in the kt plane along which 

dP _ SP I dP dk 
dt dt ^ dk dt 



P{k, t) is constant. We find ^ = ^ + f^^ = along the family of curves defined by 



t 



= (16) 



iNB 



AC 

1±./1 + ^ 
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K, serves to label different characteristic curves and is chosen to appear in this manner so that k = k when t = 0. Then, 
P{k, t) = P{k, 0) = P{n, 0) = exp (—inxo) along the characteristic curves, and we obtain the formal solution 

^(^'*) = ^y e-'''^^'*^'''e'^''dk (17) 

where K{k, t) is obtained from EqfT6l 

This formidable inversion integral gives the full solution for all x and t, but fortunately we do not need to evaluate 
the integral in order to obtain the fixation probability of the mutator. A moment's reflection convinces us that the 
t ^ oo behavior of EqlT4] is the build-up of a delta function at the absorbing state x = and a "decay" of the 
remaining probability to the fixation state. We note that the probability which corresponds to the delta function is the 
k ^ oo component of P{k, t). Taking the k ^ oo limit of EqlT6] we obtain 

P{X = 0,<) = 6"*'^-^° 
£± _ g-i(z+-Z-)t/Af 

" ^- i_e-'(^+-2-)*/^ 

Finally, taking the t oo limit and setting P(l, t oo) — 1 — P{0, t oo), we obtain the familiar expression 

P(l,i^oo) = l-e"°l"-l = l-e-^"°" (18) 



Sfj. = z 



2 2 
which is the same as Eq(6, main text) obtained from Eq|9] 



V/(1 - tte)^ +4aes/A<+ - (1 - ae 



A^5„> 1 



(19) 



4 Perturbative Approach to the Effect of 

The small effect of mutations in wild-type backgrounds observed in simulations motivates a perturbative solution to 
Eq|9] In terms of the parameters B± = ^±[1 — ae{l — s)] and C± = fi±aeS, 



In order to make analytic progress, we make the following assumptions, (i) The mutator is strongly favored, and 
therefore jz^^ > Goo- (ii) Goo ~ Go + Gi, where Go is given by the solution to the case /i_ = and Go ^ Gi. 
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Then we have 



G'i{xo) - NB+G[{xo) - N^C+Gi{xo) = -N^a 



1 - e 



N{B+-^Bl+4C+)xj2 



(20) 



where we have also dropped the small term B^Gi{xo)- This equation can be solved using the theory of non- 
homogeneous linear differential equations. A convenient way to write the two independent solutions to the homo- 
geneous version of Eql20lis 



9<{xo) = e^+^"°/2sinh( 



9>{xo) = e^+^-°/2 sinh V^S^ + 4C+(1 - x,)^ 



If we denote the inhomogeneity m{xo), our solution for Gi {xo) can be written in terms of the integrals 



tri cco = / mix) ax + / mix) ax 



where the Wronskian Wr(x) = g'^ {x)g<c (x) — <?> {x)g'^ (x). The first-order contribution to the fixation probability for 
small Xo is then 



Fi{Xo) « -Xo - — Gi{Xo) 

axo 



Xa=0 



The Wronskian is evaluated as 



Wr{x) = -^e^+^^j^Bl + 4C+ sinh ( "-^^/Bl+iC+ 



Thus, fy{x)/Wr{x) decays rapidly for large x as e ^(B+y/B^+4:C+)x/2^ -pj^^^ allows us to simplify the integral by 
extending the range of integration to infinity, which yields 



.00 -, _ N{B+-^Bl+4C+)x/2 

F^ixo) « -^i-a,sN'xo / dx^—^ — g-^(s++Vn+ic7)x/2 

Jo X 



Using the identity 



dx = ln(fe/a) 

X 
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we finally arrive at 



2J1 + 4- 



Fi{xo) « -ti-aesN''xo\n I " , ^^'^ = I (21) 

The logarithmic factor varies between zero in the limit /i+ 3> 4aeS and ln(2) in the opposite limit. This method breaks 
down whenFi > Fq. Now, Fq is bounded from above by NxqS'^ < NxoCteS, as given in Eq(ll, main text). Therefore, 
Eq|2T|will typically fail when fi^a^sN'^ ^ NaeS, or, N^^ ^ 1, which is, unfortunately, usually the case. 



5 Ne for a Population of Periodically Changing Size 



Whereas our model describes a population of constant size, experiments by ISniegowski et al.\ (119971) were done 
according to a serial dilution protocol in which a population of size A^o « 5 x 10® was grown to size Nf « 5 x 10^, 
diluted 100 fold, then repeated. Under these dynamics, all lineages grow essentially deterministically from N^, to 
Nf, at which point binomial sampling abruptly reduces the population size back to Ng. In this case, the fixation 
probability tt of an advantageous mutant depends not only on s, but also on when it is generated during the dilution 
cycle. Mutants that are generated during the early part of the cycle are allowed more time to gr ow exponentially faster 



than the wild-type and th us have an advantage over late occurring mutants. It can be shown jWAHL and GerrishL 



2001 



Wahl et al. 



20021) that the stochastic effects of these population bottlenecks are in many ways equivalent to 
those of a population with constant size iV,.. More precisely, if we let m = the number of newly generated mutants 
that will achieve fixation, then we require that the average value of ^ to be the same in the two populations. In 
the bottleneck population, the total number of newly generated individuals = iy{t) = A^o(e*'"^ — 1), and dm = 
IJ-n{s, t)dv = NoHn{s, t) ln(2)e*'"^(it. In the constant size population, ^ — N^^is. Equating these two expressions 
for ^ and averaging over one dilution cycle, we obtain 



e*'"27r(s,t)cit 



(22) 



where g = j ^ln(^) k. 6.6 is th e number of growth generations separating Nq and Nj. For gs\TL2 <C 1 it 



can be shown jWAHL and Gerrish , 
2A^o5ln^2 « 6.3 x 10^. 



20011) that 7r(s,t) « 2sln(2)ge-*'"2^ ^nd therefore Eql22l implies that N,, 
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Table 1: Values of relevant parameters for non-mutators in E. coli, as reported in various references. We assume that 
all mutation rates are 100 x greater in mutators. Mutation rates are per genome per replication. "Selection coefficient" 
refers to that of advantageous mutations only. 



Reference 



2.0 X 10^ 

2.8 X 10~i° 
2 X 10-^ 
4 X 10"^ 

5.9 X 10-8 



fJ-del 



u 



Selection Coefficient 



Hegreness etal. (2006) 
Lenski etal. (igQll 
Perfeito etal. (20071 

IMHOF and SCHLOTTERER (2001) 

ROZEN et al. (2002) 
KiBOTA and LYNCH (1996) 
Keightley and Eyre-Walker (1999) 
Taddei etal. (1997) 
BOE etal. (2000) 



1.9 X 10"* 
1.6 X 10-3 



5 X IQ-^ 
5 X IQ-s 



.054 
.10 

.023 
.02 

.0235 



6 Detailed Comparison to Experiment 

In biological populations, mutants with a spect rum of beneficial effects are generated at specific ra tes ^i,pp{s)ds, 



where p{s) is likely a decreasing function of s (IOrr , 



2003 



Eyre- Walker and Keightley 



2007). The weakest 



mutants are generated frequently, but are unlikely to achieve fixation because (i) their intrinsic fixation probability 
TT ~ s is small, and, (ii) in reasonably large populations, several of these mutations exist simultaneously and thus 
compete with one another. Conversely, stronger mutants are seldom generated, but likely achieve fixation. These con- 
flicting influences result in beneficial mutations of some intermediate size s\p (s), N, /ibp] typically achieving fixation 



dGERRiSH and Lenskl 



1998 



Desai et al. 



2007 



Hegreness etal. 



20061) . These mutants are generated at a per 



capita rate iJten ~ Mbp /^°° p{s)ds. Thus, whenever the population size is large enough for the aforementioned effects 
to play a strong role, the microscopic parameters p^p and p{s) result in the macroscopic parameters s and pben- These 
are the parameters that we list in table 2 and plug into our model. This macroscopic viewpoint tightens the connection 
between our simple model and experimental reahty. 

Plugging in in various parameters from table 2 in to ISLA (see main text), we obtain values of Pf ix in the range 



3.5 X 10-9 < Pf^oo,^.la < 1-0 X 10" 



(23) 



This range for Pfix.isia is strikingly broad, and results from a correspondingly broad range in the beneficial mutation 
rate. This rate depends on the particular strain of E. coli used, the environmental conditions, the population size 



dGERRiSH and Lenskl 



1998 



Perfeito et al 



20071), and exactly which mutations are counted in calculating the 



beneficial mutation rate. 
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7 Numerical Integration 

In order to produce the solid curves in Figs. (4, 5, 7, 8) from the main text, we first had to numerically integrate Eq|9] 
subject to the boundary conditions Gtxj(O) = 1 and Goo(l) = 0. The procedure for the case /i_ = is relatively 
simple. We initiate integration near the singular point at Xo = 1, taking G'^{1 — e) = — 1 and 6*00(1 — e) = e. Here, e 
is a very small positive number and the initial slope —1 is arbitrary. The integration is then performed from = 1 — e 
to = using a fourth order Runge-Kutta algorithm. The resulting trial solution to Eq|9]does not obey the boundary 
condition at Xo = 0. However, because the equation is linear, the correct solution is obtained simply by re-scaling the 
trial solution so that the boundary condition is satisfied. We then evaluate Goo (-001) using a cubic spline and obtain 
iS*^ by inverting Eq(2, main text) using a root solver 

For /i_ > 0, the procedure is slightly more involved. Eq|9]now has singular points at both = and Xq = 1. 
Therefore, we must integrate from both the right and the left, then match these two solutions and their derivatives in the 
middle. Specifically, we first integrate Eq|9]from the right, as before, but now stopping at Xo = .5. Call this un-scaled 
solution solution Gr{xo)- We then generate a trial solution Gi{xo) initialized near Xo = 0, taking GJ(e) = —NSo and 
Gi{xo) = 1 — NSoe- Here, So is given by Eq(10, main text) and merely serves as an initial guess as to the behavior of 
the solution near Xo = 0. We can ensure that Gr(.5) — G/(.5) simply by re-scaling Gr{xo)- However, the slopes will, 
in general, not match at Xo = .5. In order to accomplish this matching, we link the above procedure to a root solver 
which repeatedly adjusts Gj(e) and generates trial solutions until one is found for which GJ(.5) = GJ,(.5). We then 
proceed to calculate 5^ as before, using the coiTect solution Gi{xo). 

8 Ensemble Averaging 

The point-like symbols in the figures in the main text result from values of Pfi^ {N,Xo, s,a, ij,±) obtained by simulating 
numerous competition experiments. The averaging procedure varied somewhat, depending on parameters used, though 
this had no effect on our results. Here, we explicitly report the averaging details for each case. 

• All data from populations of size N = 5000 result from 10, 000 trials run for each .To e { .003, .009, .015, .021}. 
The Pf ix obtained from each value of Xo was then translated into a value for S*^ via Eq.(2, main text). These 
four values were averaged to obtain the values presented in the figures. 

• For data from populations of size = 1000, the procedure was identical to the case where N ~ 5000, but with 
100, 000 trials for each Xo- 

• For data from populations of size N = 100, 000, the procedure varied slightly between different parameter 
choices. In Fig(2, main text) (left) and Fig(5, main text) we used 20, 000 trials each from Xq G {10""*, 5 x 10""*}. 



9 





1 1 1 


Mill 1 


1 1 1 1 1 1 1 II 


1 1 1 1 M M II 


1 ' 


1 1 1 1 1 1 
- 






ISLA, A2 










0.6 




— - ISLA, A2 ' 
O Simula.tion 










0.5 








/ ° 






0.4 








/ ° 

/ ' ' ' 


o 


- 

— 


m 








/ 0'-''' 




- 


0.3 












0.2 










o 
















0.1 















0.0 




-TTrTer""^ 1 





1 


1 1 


1 1 ll-U 


001^ 


0.01 


1 


100 





Figure 1: The effect of using A2* instead of A2. When /i+/s < 1, ISLA overestimates the results of simulations 
when it uses A2. The opposite effect is observed if we instead make the assumption A2*, which immediately kills the 
fraction (1-s) of advantageous mutants that are eventually lost to random drift. This suggests that the error accumulated 
for /i+/s < 1 is due to the approximate manner in which ISLA treats these advantageous mutants. Parameters are 
N = 5000, /i_ = 0, a = .4, s = 1/120, 5 = 0. 



In Fig(6, main text), we used 20, 000 trials from = 2 x 10 ^. In Fig(2, main text)(right) we used 10, 000 
trials from Xo E {lO""^, lO'^} 



9 Elaboration on A2* 

As mentioned in the main text, A2 is somewhat awkward. An alternative, which we call A2*, it immediately kill 
advantageous mutations which are destined to eventually succumb to drift. This approximation merely modifies a 

coefficient in Eq|9] The effect is simply the transposition > a^- In fact, we occasionally made this substitution 

in the text, when we anticipated that ae ^ 1. Typical behavior of A2 relative to A2* is illustrated in Fig[T] Even 
though A2* yields results that are arguably more accurate than those of A2, we preferred A2 in the main text because 
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it nicely serves as an upper bound on mutator success. 



10 Fixation and Loss Times when /i_ = and /i_ > 

As mentioned in the main text, we do not fully understand why ISLA often fails in the weak-effect mutator regime. To 
further explore this issue, in Fig|2]we compared the distributions of fixation and loss times for /i_ = and > 0. 
We found very little difference in these distributions, suggesting that mutations in the wild-type subpopulation have 
only minor effects on the fixation process and apparently can be neglected. The mechanism by which mutators succeed 
despite beneficial mutations in wild-type backgrounds is poorly understood and clearly deserves further attention in 
future work. 

11 Simulations with Very Large s 

Fig[3] shows that ISLA captures the effect of beneficial mutations in wild-type backgrounds only when s is sufficiently 
large. When s = 1/21, ISLA greatly overestimates the the effect of mutations in wild-type backgrounds, whereas the 
agreement is much better when s = 1/3. We do not have a quantitative understanding of how large s must be in order 
to achieve agreement. 
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Figure 2: The distributions of fixation and loss times for cases where Pfix ~ 1%. The left (right) column shows 
the distribution of fixation (loss) times. The top row corresponds to /i_ =0 and the bottom row to = 100. 

Notice the logarithmic scale and the extremely long tails on the tioss distributions. The two tioss distributions have 
the same mean tioss ~ 40 generations, which is of the same order as tdrift = ~ 92 generations. The tfi^ 

distributions have means tfi^ « 1300 generations (/i_ = 0) and tfi^ sa 1400 generations {fj.^/ i^l^ = 100). Since 



ts 



ln[Ns) 



800 generations are required for an advantageous mutant to sweep the population, we see that 
500 — 600 generations passed before a beneficial mutant destined for fixation was generated. Thus, when mutator 
fixation occurs, such beneficial mutations are typically generated early compared to imut = {asfi+Nxo)^^ = 3 x 10^ 
but late compared to tdrift- Sfj, is determined mostly by the probability that the mutator survives the long drift period 
and this is barely affected by wild-type beneficial mutant fixation events. Parameters are = 10^, s = 1/120, a = 
.4, Xo = 10^^, (5 = 0, yU+ = 10^'^. Note that the initial overall mutation rate in the wild-type population is lOOx that 
in the mutator subpopulation. 
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Figure 3: Simulation data for very large s. When s = 1/21, ISLA greatly overestimates the the effect of mutations in 
wild-type backgrounds, whereas the agreement is much better when s = 1/3. Parameters are N = 1000, i^l^/i^i- = 
10, ae = .4,(5 = 
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